Out[3]:
Out[4]:

Wake County – Environmental Health & Safety Division¶

"Environmental Health & Safety promotes public health through plan review, permitting and inspections of food service establishments, child day care facilities, nursing homes, hotels, public pools and tattoo artists."

- Wake County Government Website 

Out[5]:

Inspector : Restaurant Ratio in Wake County (2016-Present)¶

Out[8]:

Background¶

Problem

YoY increase in restaurant inspection violations* as well as critical violations in Wake County. This presents a threat to public health & food safety.


Opportunity

With limited/fixed resources & inspectors, Wake County will benefit from a solution to flag inspections with a higher risk of critical violations.

Many of those violations, include “critical” ones. Critical violations can include storing raw meat near ready-to-eat vegetables, inadequate hand washing or keeping foods at unsafe temperature, among other examples.

Out[10]:

Impact Metrics¶

Out[11]:
Out[12]:

Sourcing Our Data¶

Wake County Restaurants from Wake Gov Open Data¶

Out[13]:
HSISID NAME ADDRESS1 CITY POSTALCODE PHONENUMBER RESTAURANTOPENDATE PERMITID X Y
0 4092016155 DAILY PLANET CAFE 11 W JONES ST RALEIGH 27601 1.919708e+10 2012-04-12 2 -78.639431 35.782205
1 4092016161 HIBACHI 88 3416 POOLE RD RALEIGH 27610 1.919231e+10 2012-04-18 4 -78.579533 35.767246
2 4092017180 BOND BROTHERS BEER COMPANY 202 E CEDAR ST CARY 27511 1.919459e+10 2016-03-11 5 -78.778021 35.787986

Restaurant Inspections Data (2016 - Present) from Wake Gov Open Data¶

Out[14]:
OBJECTID HSISID SCORE DATE TYPE INSPECTOR
0 22332274 4092017542 94.5 2017-04-07 Inspection Anne-Kathrin Bartoli
1 22332275 4092017542 92.0 2017-11-08 Inspection Laura McNeill
2 22332276 4092017542 95.0 2018-03-23 Inspection Laura McNeill

Violations Data (2016 - Present) from Wake Gov Open Data¶

Out[15]:
OBJECTID HSISID INSPECTDATE CATEGORY CRITICAL SEVERITY SHORTDESC INSPECTEDBY POINTVALUE OBSERVATIONTYPE VIOLATIONTYPE
0 191104519 4092012065 2016-06-07 Approved Source No NaN Food obtained from approved source Johanna Hill 0.0 OUT VR
1 191104520 4092017322 2020-07-10 Approved Source No NaN Food obtained from approved source Lauren Harden 0.0 OUT NaN
2 191104521 4092030492 2021-06-14 Approved Source No NaN Food obtained from approved source David Adcock 1.0 OUT NaN

Weather Data (Avg hourly temperature for the day, 2016 - Present) from NOAA¶

Out[16]:
date TAVG
0 2016-01-01 49.0
1 2016-01-02 43.0
2 2016-01-03 40.0

Yelp Data (Yelp Fusion API - Search by term/phone)¶

Out[17]:
name review_count rating price phone display_phone category_title
0 Peace China 63 3.5 1 19196769968 (919) 676-9968 chinese
1 Asian Cafe 7 3.0 2 19196769968 (919) 676-9968 chinese
2 Northside Bistro & Cocktails 23 4.5 -1 19198905225 (919) 890-5225 american (new)
3 The Daily Planet Cafe 89 4.0 2 19197078060 (919) 707-8060 cafes
4 Hibachi 88 46 3.5 1 19192311688 (919) 231-1688 japanese

Daily Police Incidents as proxy for Crime from Wake Gov Open Data¶

Out[18]:
OBJECTID crime_category crime_code crime_description city reported_date reported_year reported_month reported_day reported_dayofwk
0 12001 MISCELLANEOUS 81H Miscellaneous/Missing Person (18 & over) RALEIGH 2017-01-15 2017 1 14 Saturday
1 12002 MISCELLANEOUS 81A Miscellaneous/All Other Non-Offenses RALEIGH 2017-07-29 2017 7 29 Saturday
2 12003 MISCELLANEOUS 81F Miscellaneous/Mental Commitment RALEIGH 2016-03-07 2016 3 6 Sunday

Understanding our data¶

Out[19]:

EDA & Profiling¶

In [20]:
ProfileReport(final_features_df, title="Feauture Profiling Report", explorative=True)
Out[20]:

Understanding the features¶

Why did we extract category as a feature?¶

Parallel Categories Diagram (a.k.a alluvial diagram)¶

Modeling & Evaluation¶

Out[24]:
F1 Score: 0.6791808873720137
Accuracy: 0.7279305354558611

Area Under the Receiver Operating Characteristic curve¶

The Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish between classes and is used as a summary of the ROC curve. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes.

Interpreting predictions using SHAP¶

SHAP Values (an acronym from SHapley Additive exPlanations) break down a prediction to show the impact of each feature.

Out[27]:
Out[29]:

References¶

  1. https://www.bizjournals.com/triangle/news/2015/09/23/wake-county-restaurants-critical-health-code
  2. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5302274
  3. https://christophm.github.io/interpretable-ml-book/shap.html
  4. https://data.wakegov.com/
  5. https://fusion.yelp.com/
  6. https://www.noaa.gov/weather
  7. http://web.pdx.edu/~newsomj/pa551/lectur15.htm
  8. https://wake-nc.healthinspections.us/
  9. Wake County Environmental Health & Safety Division Officials

Thank You!¶

Questions/Suggestions?¶